20 research outputs found
Comprehensive analysis of normal adjacent to tumor transcriptomes.
Histologically normal tissue adjacent to the tumor (NAT) is commonly used as a control in cancer studies. However, little is known about the transcriptomic profile of NAT, how it is influenced by the tumor, and how the profile compares with non-tumor-bearing tissues. Here, we integrate data from the Genotype-Tissue Expression project and The Cancer Genome Atlas to comprehensively analyze the transcriptomes of healthy, NAT, and tumor tissues in 6506 samples across eight tissues and corresponding tumor types. Our analysis shows that NAT presents a unique intermediate state between healthy and tumor. Differential gene expression and protein-protein interaction analyses reveal altered pathways shared among NATs across tissue types. We characterize a set of 18 genes that are specifically activated in NATs. By applying pathway and tissue composition analyses, we suggest a pan-cancer mechanism of pro-inflammatory signals from the tumor stimulates an inflammatory response in the adjacent endothelium
Recommended from our members
ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data.
Objectives:Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both data science and EHR structure. The Observational Medical Out-comes Partnership (OMOP) common data model (CDM) standardizes the language and structure of EHR data to promote interoperability of EHR data for research. While the OMOP CDM is valuable and more attuned to research purposes, it still requires extensive domain knowledge to utilize effectively, potentially limiting more widespread adoption of EHR data for research and quality improvement. Materials and methods:We have created ROMOP: an R package for direct interfacing with EHR data in the OMOP CDM format. Results:ROMOP streamlines typical EHR-related data processes. Its functions include exploration of data types, extraction and summarization of patient clinical and demographic data, and patient searches using any CDM vocabulary concept. Conclusion:ROMOP is freely available under the Massachusetts Institute of Technology (MIT) license and can be obtained from GitHub (http://github.com/BenGlicksberg/ROMOP). We detail instructions for setup and use in the Supplementary Materials. Additionally, we provide a public sandbox server containing synthesized clinical data for users to explore OMOP data and ROMOP (http://romop.ucsf.edu)
Recommended from our members
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.
There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods
PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model.
MotivationElectronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge.ResultsWe present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes.Availability and implementationPatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu.Supplementary informationSupplementary data are available at Bioinformatics online
Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research
Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; \u3c37 \u3eweeks) or (2) early preterm birth (ePTB; \u3c32 \u3eweeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth
Systematic identification of ACE2 expression modulators reveals cardiomyopathy as a risk factor for mortality in COVID-19 patients.
BackgroundAngiotensin-converting enzyme 2 (ACE2) is the cell-entry receptor for SARS-CoV-2. It plays critical roles in both the transmission and the pathogenesis of COVID-19. Comprehensive profiling of ACE2 expression patterns could reveal risk factors of severe COVID-19 illness. While the expression of ACE2 in healthy human tissues has been well characterized, it is not known which diseases and drugs might be associated with ACE2 expression.ResultsWe develop GENEVA (GENe Expression Variance Analysis), a semi-automated framework for exploring massive amounts of RNA-seq datasets. We apply GENEVA to 286,650 publicly available RNA-seq samples to identify any previously studied experimental conditions that could be directly or indirectly associated with ACE2 expression. We identify multiple drugs, genetic perturbations, and diseases that are associated with the expression of ACE2, including cardiomyopathy, HNF1A overexpression, and drug treatments with RAD140 and itraconazole. Our joint analysis of seven datasets confirms ACE2 upregulation in all cardiomyopathy categories. Using electronic health records data from 3936 COVID-19 patients, we demonstrate that patients with pre-existing cardiomyopathy have an increased mortality risk than age-matched patients with other cardiovascular conditions. GENEVA is applicable to any genes of interest and is freely accessible at http://genevatool.org .ConclusionsThis study identifies multiple diseases and drugs that are associated with the expression of ACE2. The effect of these conditions should be carefully studied in COVID-19 patients. In particular, our analysis identifies cardiomyopathy patients as a high-risk group, with increased ACE2 expression in the heart and increased mortality after SARS-COV-2 infection
Recommended from our members
Mortality Risk Among Patients With COVID-19 Prescribed Selective Serotonin Reuptake Inhibitor Antidepressants
ImportanceAntidepressant use may be associated with reduced levels of several proinflammatory cytokines suggested to be involved with the development of severe COVID-19. An association between the use of selective serotonin reuptake inhibitors (SSRIs)-specifically fluoxetine hydrochloride and fluvoxamine maleate-with decreased mortality among patients with COVID-19 has been reported in recent studies; however, these studies had limited power due to their small size.ObjectiveTo investigate the association of SSRIs with outcomes in patients with COVID-19 by analyzing electronic health records (EHRs).Design, setting, and participantsThis retrospective cohort study used propensity score matching by demographic characteristics, comorbidities, and medication indication to compare SSRI-treated patients with matched control patients not treated with SSRIs within a large EHR database representing a diverse population of 83 584 patients diagnosed with COVID-19 from January to September 2020 and with a duration of follow-up of as long as 8 months in 87 health care centers across the US.ExposuresSelective serotonin reuptake inhibitors and specifically (1) fluoxetine, (2) fluoxetine or fluvoxamine, and (3) other SSRIs (ie, not fluoxetine or fluvoxamine).Main outcomes and measuresDeath.ResultsA total of 3401 adult patients with COVID-19 prescribed SSRIs (2033 women [59.8%]; mean [SD] age, 63.8 [18.1] years) were identified, with 470 receiving fluoxetine only (280 women [59.6%]; mean [SD] age, 58.5 [18.1] years), 481 receiving fluoxetine or fluvoxamine (285 women [59.3%]; mean [SD] age, 58.7 [18.0] years), and 2898 receiving other SSRIs (1733 women [59.8%]; mean [SD] age, 64.7 [18.0] years) within a defined time frame. When compared with matched untreated control patients, relative risk (RR) of mortality was reduced among patients prescribed any SSRI (497 of 3401 [14.6%] vs 1130 of 6802 [16.6%]; RR, 0.92 [95% CI, 0.85-0.99]; adjusted P = .03); fluoxetine (46 of 470 [9.8%] vs 937 of 7050 [13.3%]; RR, 0.72 [95% CI, 0.54-0.97]; adjusted P = .03); and fluoxetine or fluvoxamine (48 of 481 [10.0%] vs 956 of 7215 [13.3%]; RR, 0.74 [95% CI, 0.55-0.99]; adjusted P = .04). The association between receiving any SSRI that is not fluoxetine or fluvoxamine and risk of death was not statistically significant (447 of 2898 [15.4%] vs 1474 of 8694 [17.0%]; RR, 0.92 [95% CI, 0.84-1.00]; adjusted P = .06).Conclusions and relevanceThese results support evidence that SSRIs may be associated with reduced severity of COVID-19 reflected in the reduced RR of mortality. Further research and randomized clinical trials are needed to elucidate the effect of SSRIs generally, or more specifically of fluoxetine and fluvoxamine, on the severity of COVID-19 outcomes
Comparing Ethnicity-Specific Reference Intervals for Clinical Laboratory Tests from EHR Data
BackgroundThe results of clinical laboratory tests are an essential component of medical decision-making. To guide interpretation, test results are returned with reference intervals defined by the range in which the central 95% of values occur in healthy individuals. Clinical laboratories often set their own reference intervals to accommodate variation in local population and instrumentation. For some tests, reference intervals change as a function of sex, age, and self-identified race and ethnicity.MethodsIn this work, we develop a novel approach, which leverages electronic health record data, to identify healthy individuals and tests for differences in laboratory test values between populations.ResultsWe found that the distributions of >50% of laboratory tests with currently fixed reference intervals differ among self-identified racial and ethnic groups (SIREs) in healthy individuals.ConclusionsOur results confirm the known SIRE-specific differences in creatinine and suggest that more research needs to be done to determine the clinical implications of using one-size-fits-all reference intervals for other tests with SIRE-specific distributions
Recommended from our members
A certified de-identification system for all clinical text documents for information extraction at scale.
ObjectivesClinical notes are a veritable treasure trove of information on a patient's disease progression, medical history, and treatment plans, yet are locked in secured databases accessible for research only after extensive ethics review. Removing personally identifying and protected health information (PII/PHI) from the records can reduce the need for additional Institutional Review Boards (IRB) reviews. In this project, our goals were to: (1) develop a robust and scalable clinical text de-identification pipeline that is compliant with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule for de-identification standards and (2) share routinely updated de-identified clinical notes with researchers.Materials and methodsBuilding on our open-source de-identification software called Philter, we added features to: (1) make the algorithm and the de-identified data HIPAA compliant, which also implies type 2 error-free redaction, as certified via external audit; (2) reduce over-redaction errors; and (3) normalize and shift date PHI. We also established a streamlined de-identification pipeline using MongoDB to automatically extract clinical notes and provide truly de-identified notes to researchers with periodic monthly refreshes at our institution.ResultsTo the best of our knowledge, the Philter V1.0 pipeline is currently the first and only certified, de-identified redaction pipeline that makes clinical notes available to researchers for nonhuman subjects' research, without further IRB approval needed. To date, we have made over 130 million certified de-identified clinical notes available to over 600 UCSF researchers. These notes were collected over the past 40 years, and represent data from 2757016 UCSF patients
Deep phenotyping of Alzheimer's disease leveraging electronic medical records identifies sex-specific clinical associations.
Alzheimer's Disease (AD) is a neurodegenerative disorder that is still not fully understood. Sex modifies AD vulnerability, but the reasons for this are largely unknown. We utilize two independent electronic medical record (EMR) systems across 44,288 patients to perform deep clinical phenotyping and network analysis to gain insight into clinical characteristics and sex-specific clinical associations in AD. Embeddings and network representation of patient diagnoses demonstrate greater comorbidity interactions in AD in comparison to matched controls. Enrichment analysis identifies multiple known and new diagnostic, medication, and lab result associations across the whole cohort and in a sex-stratified analysis. With this data-driven method of phenotyping, we can represent AD complexity and generate hypotheses of clinical factors that can be followed-up for further diagnostic and predictive analyses, mechanistic understanding, or drug repurposing and therapeutic approaches